Peripheral Blood Microbiome Analysis via Noninvasive Prenatal Testing Reveals the Complexity of Circulating Microbial Cell-Free DNA

ABSTRACT While circulating cell-free DNA (cfDNA) is becoming a powerful marker for noninvasive identification of infectious pathogens in liquid biopsy specimens, a microbial cfDNA baseline in healthy individuals is urgently needed for the proper interpretation of microbial cfDNA sequencing results in clinical metagenomics. Because noninvasive prenatal testing (NIPT) shares many similarities with the sequencing protocol of metagenomics, we utilized the standard low-pass whole-genome-sequencing-based NIPT to establish a microbial cfDNA baseline in healthy people. Sequencing data from a total of 107,763 peripheral blood samples of healthy pregnant women undergoing NIPT screening were retrospectively collected and reanalyzed for microbiome DNA screening. It was found that more than 95% of exogenous cfDNA was from bacteria, 3% from eukaryotes, and 0.4% from viruses, indicating the gut/environment origins of many microorganisms. Overall and regional abundance patterns were well illustrated, with huge regional diversity and complexity, and unique interspecies and symbiotic relationships were observed for TORCH organisms (Toxoplasma gondii, others [Treponema pallidum {causing syphilis}, hepatitis B virus {HBV}, and human parvovirus B19 {HPV-B19}], rubella virus, cytomegalovirus [CMV], and herpes simplex virus [HSV]) and another common virus, Epstein-Barr virus (EBV). To sum up, our study revealed the complexity of the baseline circulating microbial cfDNA and showed that microbial cfDNA sequencing results need to be interpreted in a more comprehensive manner. IMPORTANCE While circulating cell-free DNA (cfDNA) has been becoming a powerful marker for noninvasive identification of infectious pathogens in liquid biopsy specimens, a baseline for microbial cfDNA in healthy individuals is urgently needed for the proper interpretation of microbial cfDNA sequencing results in clinical metagenomics. Standard low-pass whole-genome-sequencing-based NIPT shares many similarities with the sequencing protocol for metagenomics and could provide a microbial cfDNA baseline in healthy people; thus, a reference cfDNA data set of the human microbiome was established with sequencing data from a total of 107,763 peripheral blood samples of healthy pregnant women undergoing NIPT screening. Our study revealed the complexity of circulating microbial cfDNA and indicated that microbial cfDNA sequencing results need to be interpreted in a more comprehensive manner, especially with regard to geographic patterns and coexistence networks.

interpreting levels in disease. Cell-free DNA offers the potential to diagnose infections when direct sampling is invasive or not possible. Microbial sequences were detected in nearly every sample and were predominantly bacterial in origin. The authors defined the microorganisms and potential interactions between bacterial and viral communities were defined. Significant regional diversity in prevalence of viruses were identified. They also constructed a weighted micro-organism co-existence network.
The methods are clearly stated, the figures are clear, and the figure legends are appropriate. The microecology shifts in bacterial sequence abundance detected by the presence of TORCH virus genomes is particularly interesting. One of the study goals stated in the abstract is to establish the microbial cfDNA baseline in healthy individuals. In the methods they describe establishing a method to calculate the likelihood of any test sample that falls out of the normal range, using NIPT WGS data which is sequenced at a lower depth, for the future use of diagnosing sepsis. In the Availability of data and materials section, the authors state that all sequencing data that is held by a company will be shared upon request. Adding that their analysis code will also be available to shared would allow the analysis that is visualized in the figures in this manuscript to serve as a true reference dataset for investigators studying samples from diseased patients.
The conclusions are not fully supported by the data presented and should be reworded. The conclusion that microbial cfDNA is a potential biomarker of disease is accurate. Referring to the presence of cfDNA in the blood as blood microbiota risks creating confusion, as the presence of DNA does not necessarily imply the presence of intact bacterial microbiota. "Blood microbiota may represent or contribute to the first step in the kinetics of disease" is speculative and not a conclusion of the study. Considering this study is in healthy individuals, the presence of gut commensals doesn't indicate pathogens driving disease, and the presence of viral sequence reflects chronic infections rather than the first step of acute infection.
Minor comments: • HHV8 is mentioned in line 237 in reference to figure 1D, but not listed in the figure.
• Suggest removing the subjective descriptor of "dramatic" when describing the regional differences in Shannon's diversity (line 241). The results show the majority are 5.3 and range from 5 to 5.6. Shannon diversity is somewhat contextual and dramatic in disease conditions is used to refer to differences between >5 and <1. • Line 263. This reviewer cannot find "Bacillus thuringiensis serovar" on Figure 3B that is a stated as a key finding in figure.
• Line 277. Can ClinicalFreq be defined so readers can properly interpret ClinFreq=36% and PopFreq=0.008 for HBV? • Line 294. If the authors consider the increase in the abundance of Propionibacterium acne to be "slight" compared with Bifidobacterium bifidum, this should be described further as the fold change and p values are similar on Thank you for inviting me to review the manuscript 'The peripheral blood microbiome analysis via NIPT reveals the complexity of circulating microbial cell-free DNA baselines'. My general impression is that the described study will be of interest for the research community. Innovative research data are described.
Major shortcomings are: Title:NIPT abbreviation should be explained in the title.

Abstract:
The abstract is not fully clear. The aims of the study, methodology and research problems that the authors investigated are not very clear. The major findings or trends are only hinted at. Brief and more concrete description regarding approach and experimental design, materials, concrete results, conclusions and interpretation should be described.
Introduction: Lines 59-63: It would be useful for the readers to understand the meaning of the term used by the authors ‚baseline microbiome' and if they mean also ‚core microbiome'. Line 63-64: 'We also investigated the bacteria and virus ecologies in human blood as well as the co-existence and interactions between these microbes.' -Results demonstrate that the study was limited to the evaluation of the microbial abundance in serum samples and that some indirect (statistical) interactions could be supposed.

Materials:
The raw sequencing data are not publicly available. The developed and applied statistical algorithms are not available for open access.
Results: Line 308-311: Please clarify the meaning or refrase ‚... medically utilized bacteria Streptomyces (246) and Streptomyces phage (247) as its center. Also which hypothesis you mean? Figure 6. Specify the meaning of the numbers in the squares and octagons. A supplementary file would be useful. All figures and table are appropriate.
Discussion: Line 383-387: 'our current study timely established a reference range of characteristic peripheral blood microorganisms, which is helpful for early detection and interpretation of infectious diseases, especially regional epidemic diseases, and provides the pathogenic evidence for clinical diagnosis.' -This statement sounds logic but it is not supported by concrete data. It sounds like a declaration. Please rephrase.
I agree with the conclusion -'Furthermore, these results suggested that different countries and regions may have distinct normal reference intervals of microbes in peripheral blood, which may be influenced by diet, geography, and environmental factors.' The study results give support.

Preparing Revision Guidelines
To submit your modified manuscript, log onto the eJP submission site at https://spectrum.msubmit.net/cgi-bin/main.plex. Go to Author Tasks and click the appropriate manuscript title to begin the revision process. The information that you entered when you first submitted the paper will be displayed. Please update the information as necessary. Here are a few examples of required updates that authors must address: • Point-by-point responses to the issues raised by the reviewers in a file named "Response to Reviewers," NOT IN YOUR COVER LETTER. • Upload a compare copy of the manuscript (without figures) as a "Marked-Up Manuscript" file. • Each figure must be uploaded as a separate file, and any multipanel figures must be assembled into one file. For complete guidelines on revision requirements, please see the journal Submission and Review Process requirements at https://journals.asm.org/journal/Spectrum/submission-review-process. Submissions of a paper that does not conform to Microbiology Spectrum guidelines will delay acceptance of your manuscript. " Please return the manuscript within 60 days; if you cannot complete the modification within this time period, please contact me. If you do not wish to modify the manuscript and prefer to submit it to another journal, please notify me of your decision immediately so that the manuscript may be formally withdrawn from consideration by Microbiology Spectrum.
If your manuscript is accepted for publication, you will be contacted separately about payment when the proofs are issued; please follow the instructions in that e-mail. Arrangements for payment must be made before your article is published. For a complete list of Publication Fees, including supplemental material costs, please visit our website.
Corresponding authors may join or renew ASM membership to obtain discounts on publication fees. Need to upgrade your membership level? Please contact Customer Service at Service@asmusa.org.
Thank you for submitting your paper to Microbiology Spectrum.

Spectrum00414-22
The peripheral blood microbiome analysis via NIPT reveals the complexity of circulating microbial cellfree DNA baselines

Comments and Suggestions for the Author:
In the manuscript "The peripheral blood microbiome analysis via NIPT reveals the complexity of circulating microbial cell-free DNA baselines", Tong et al define microbial cell-free DNA (cfDNA) in >100K healthy individuals to provide a reference for interpreting levels in disease. Cell-free DNA offers the potential to diagnose infections when direct sampling is invasive or not possible. Microbial sequences were detected in nearly every sample and were predominantly bacterial in origin. The authors defined the microorganisms and potential interactions between bacterial and viral communities were defined. Significant regional diversity in prevalence of viruses were identified. They also constructed a weighted micro-organism co-existence network.
The methods are clearly stated, the figures are clear, and the figure legends are appropriate. The microecology shifts in bacterial sequence abundance detected by the presence of TORCH virus genomes is particularly interesting. One of the study goals stated in the abstract is to establish the microbial cfDNA baseline in healthy individuals. In the methods they describe establishing a method to calculate the likelihood of any test sample that falls out of the normal range, using NIPT WGS data which is sequenced at a lower depth, for the future use of diagnosing sepsis. In the Availability of data and materials section, the authors state that all sequencing data that is held by a company will be shared upon request. Adding that their analysis code will also be available to shared would allow the analysis that is visualized in the figures in this manuscript to serve as a true reference dataset for investigators studying samples from diseased patients.
The conclusions are not fully supported by the data presented and should be reworded. The conclusion that microbial cfDNA is a potential biomarker of disease is accurate. Referring to the presence of cfDNA in the blood as blood microbiota risks creating confusion, as the presence of DNA does not necessarily imply the presence of intact bacterial microbiota. "Blood microbiota may represent or contribute to the first step in the kinetics of disease" is speculative and not a conclusion of the study. Considering this study is in healthy individuals, the presence of gut commensals doesn't indicate pathogens driving disease, and the presence of viral sequence reflects chronic infections rather than the first step of acute infection.

Minor comments:
 HHV8 is mentioned in line 237 in reference to figure 1D, but not listed in the figure.
 Suggest removing the subjective descriptor of "dramatic" when describing the regional differences in Shannon's diversity (line 241). The results show the majority are 5.3 and range from 5 to 5.6. Shannon diversity is somewhat contextual and dramatic in disease conditions is used to refer to differences between >5 and <1.  Line 263. This reviewer cannot find "Bacillus thuringiensis serovar" on Figure 3B that is a stated as a key finding in figure.  Line 277. Can ClinicalFreq be defined so readers can properly interpret ClinFreq=36% and PopFreq=0.008 for HBV?  Line 294. If the authors consider the increase in the abundance of Propionibacterium acne to be "slight" compared with Bifidobacterium bifidum, this should be described further as the fold change and p values are similar on Fig. 5D.
 Fig. 5E. Authors should confirm that "candidate division" indicated on the figure is intended.  Could the authors reword line 312 "Module pink holds the largest number of viruses, mainly bacteria phages". All viruses appear pink, so it's not clear what module refers to.  Two blue clusters appear in Fig 6. Consider adding "light blue" to describe the cluster in line 329, and dark blue to line 313 describing the bacterial cluster centered on Haemophilus phage.

Microbiology Spectrum
Dear Editors: Here within enclosed is our revised manuscript titled "A population-based metagenomics analysis of peripheral blood microbiome via NIPT in China". (Spectrum00414-22) for your consideration in Microbiology Spectrum.
We gratefully appreciate for the reviewers' valuable suggestions. Each suggestion and comment brought forward by the reviewers was accurately incorporated and carefully considered, which helped us to improve the manuscript substantially. In the following pages we provide a point-topoint response to each comment from you and two reviewers.
Based on the reviewer's suggestions, we added more information and the limitations on the interpretation of cell free DNA sequencing results. We also modified the statements that might cause ambiguity as required. We hope that the revised manuscript and our accompanying responses will be sufficient to make our manuscript suitable for publication in Microbiology Spectrum.
We look forward to hearing from you soon. DNA offers the potential to diagnose infections when direct sampling is invasive or not possible. Microbial sequences were detected in nearly every sample and were predominantly bacterial in origin. The authors defined the microorganisms and potential interactions between bacterial and viral communities were defined. Significant regional diversity in prevalence of viruses were identified. They also constructed a weighted micro-organism co-existence network.
The methods are clearly stated, the figures are clear, and the figure legends are appropriate. The microecology shifts in bacterial sequence abundance detected by the presence of TORCH virus genomes is particularly interesting. One of the study goals stated in the abstract is to establish the microbial cfDNA baseline in healthy individuals. In the methods they describe establishing a method to calculate the likelihood of any test sample that falls out of the normal range, using NIPT WGS data which is sequenced at a lower depth, for the future use of diagnosing sepsis. In the Availability of data and materials section, the authors state that all sequencing data that is held by a company will be shared upon request. Adding that their analysis code will also be available to shared would allow the analysis that is visualized in the figures in this manuscript to serve as a true reference dataset for investigators studying samples from diseased patients.

Response:
We have submitted processed data to a public repository. However, due to local regulatory policies, we are not allowed to share raw data at a massive scale. The analytical script associated with this study has also been made available via github. The availability information has been updated in the revision. Here we also provide a preview access (https://ngdc.cncb.ac.cn/omix/preview/p9U2RQYf) before publication.
Major Comment 2: The conclusions are not fully supported by the data presented and should be reworded.
The conclusion that microbial cfDNA is a potential biomarker of disease is accurate. Referring to the presence of cfDNA in the blood as blood microbiota risks creating confusion, as the presence of DNA does not necessarily imply the presence of intact bacterial microbiota. "Blood microbiota may represent or contribute to the first step in the kinetics of disease" is speculative and not a conclusion of the study.